Controlled shuffling, statistical confidentiality and microdata utility: a successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-International database
ثبت نشده
چکیده
IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microdata is a sine qua non of the IPUMS project. For the 2010 round of censuses, even greater protections are required, while researchers are demanding ever higher precision and utility. This paper describes a tripartite collaborative experiment using a ten percent household sample of the 2011 census of Ireland to estimate risk, mask the microdata using controlled shuffling, and assess analytical utility by comparing the masked data against the unprotected source microdata. Controlled shuffling exploits hierarchically ordered coding schemes to protect privacy and enhance utility. With controlled shuffling, the lesson seems to be the more detail means less risk and greater utility. Overall, despite substantial perturbation of the masked dataset (30% of adults on one or more characteristic), we find that data utility is very high and information loss is slight, even for fairly complex analytical problems.
منابع مشابه
Working Paper ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE (UNECE) CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN
IPUMS-International disseminates more than two hundred integrated, confidentialized census microdata samples to thousands of researchers worldwide at no cost. The number of samples is increasing at the rate of several dozen per year, as the process of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented in the microda...
متن کاملWhen Excessive Perturbation Goes Wrong and Why IPUMS-International Relies Instead on Sampling, Suppression, Swapping, and Other Minimally Harmful Methods to Protect Privacy of Census Microdata
IPUMS-International disseminates population census microdata at no cost for 69 countries. Currently, a series of 212 samples totaling almost a half billion person records are available to researchers. Registration is required for researchers to gain access to the microdata. Statistics from Google Analytics show that IPUMS-International's lengthy, probing registration form is an effective deterr...
متن کاملCreating Statistically Literate Global Citizens: The Use of IPUMS-International Integrated Census Microdata in Teaching.
Census microdata are ideal for developing statistical literacy of university students. Access, particularly to internationally comparable microdata, has been a significant obstacle. The IPUMS-International project offers a uniform solution to providing access for policy analysts, researchers, and students to integrated microdata and metadata, while protecting statistical confidentiality. Eighty...
متن کاملThe Ipums Collaboration: Integrating and Disseminating the World's Population Microdata.
The Integrated Public Use Microdata Series (IPUMS) International partnership is a project of the Minnesota Population Center and national statistical agencies, dedicated to collecting and distributing census data from around the world. IPUMS is currently disseminating data on over a half-billion persons enumerated in more than 250 census samples from 79 countries. The data series includes infor...
متن کاملIPUMS-International High Precision Population Census Microdata Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Extracts
A breakthrough in the tradeoff between privacy and data quality has been achieved for restricted access to population census microdata samples. The IPUMS-International website, as of June 2006, offers integrated microdata for 47 censuses, totaling more than 140 million person records, with 13 countries represented. Over the next four years, the global collaboratory led by the Minnesota Populati...
متن کامل